これはインタラクティブなノートブックです。ローカルで実行するか、以下のリンクを使用できます：

を使用した評価でのHuggingFace Datasetsの使用`preprocess_model_input`

Note: This is a temporary workaround

このガイドでは、Weave評価でHuggingFace Datasetsを使用するための回避策を示します。

このプロセスを簡素化するためのよりシームレスな統合の開発に積極的に取り組んでいます。
このアプローチは機能しますが、外部データセットの操作をより簡単にする改善と更新が近い将来に期待できます。

セットアップとインポート

まず、Weaveを初期化し、実験を追跡するためにWeights & Biasに接続します。

!pip install datasets wandb weave
python
# Initialize variables
HUGGINGFACE_DATASET = "wandb/ragbench-test-sample"
WANDB_KEY = ""
WEAVE_TEAM = ""
WEAVE_PROJECT = ""

# Init weave and required libraries
import asyncio

import nest_asyncio
import wandb
from datasets import load_dataset

import weave
from weave import Evaluation

# Login to wandb and initialize weave
wandb.login(key=WANDB_KEY)
client = weave.init(f"{WEAVE_TEAM}/{WEAVE_PROJECT}")

# Apply nest_asyncio to allow nested event loops (needed for some notebook environments)
nest_asyncio.apply()

HuggingFaceデータセットの読み込みと準備

HuggingFaceデータセットを読み込みます。
データセットの行を参照するためのインデックスマッピングを作成します。
このインデックスアプローチにより、元のデータセットへの参照を維持できます。

Note:
インデックスでは、hf_hub_nameとhf_idをエンコードして、各行に一意の識別子があることを確認します。
この一意のダイジェスト値は、評価中に特定のデータセットエントリを追跡および参照するために使用されます。

# Load the HuggingFace dataset
ds = load_dataset(HUGGINGFACE_DATASET)
row_count = ds["train"].num_rows

# Create an index mapping for the dataset
# This creates a list of dictionaries with HF dataset indices
# Example: [{"hf_id": 0}, {"hf_id": 1}, {"hf_id": 2}, ...]
hf_index = [{"hf_id": i, "hf_hub_name": HUGGINGFACE_DATASET} for i in range(row_count)]

処理と評価関数の定義

Processing pipeline

preprocess_example：インデックス参照を評価に必要な実際のデータに変換します
hf_eval：モデル出力のスコア付け方法を定義します
function_to_evaluate: 評価される実際の関数/モデル

@weave.op()
def preprocess_example(example):
    """
    Preprocesses each example before evaluation.
    Args:
        example: Dict containing hf_id
    Returns:
        Dict containing the prompt from the HF dataset
    """
    hf_row = ds["train"][example["hf_id"]]
    return {"prompt": hf_row["question"], "answer": hf_row["response"]}

@weave.op()
def hf_eval(hf_id: int, output: dict) -> dict:
    """
    Scoring function for evaluating model outputs.
    Args:
        hf_id: Index in the HF dataset
        output: The output from the model to evaluate
    Returns:
        Dict containing evaluation scores
    """
    hf_row = ds["train"][hf_id]
    return {"scorer_value": True}

@weave.op()
def function_to_evaluate(prompt: str):
    """
    The function that will be evaluated (e.g., your model or pipeline).
    Args:
        prompt: Input prompt from the dataset
    Returns:
        Dict containing model output
    """
    return {"generated_text": "testing "}

評価の作成と実行

hf_indexの各インデックスに対して:
1. preprocess_example HFデータセットから対応するデータを取得します。
2. 前処理されたデータは function_to_evaluate.
3. 出力は hf_eval.
4. 結果はWeaveで追跡されます。

# Create evaluation object
evaluation = Evaluation(
    dataset=hf_index,  # Use our index mapping
    scorers=[hf_eval],  # List of scoring functions
    preprocess_model_input=preprocess_example,  # Function to prepare inputs
)

# Run evaluation asynchronously
async def main():
    await evaluation.evaluate(function_to_evaluate)

asyncio.run(main())

はじめに

評価とデータセット

モデルとプロンプト

高度なトピック

本番環境とモニタリング

APIとインテグレーション

Hf dataset evals

を使用した評価でのHuggingFace Datasetsの使用`preprocess_model_input`

Note: This is a temporary workaround

セットアップとインポート

HuggingFaceデータセットの読み込みと準備

処理と評価関数の定義

Processing pipeline

評価の作成と実行

はじめに

評価とデータセット

モデルとプロンプト

高度なトピック

本番環境とモニタリング

APIとインテグレーション

​を使用した評価でのHuggingFace Datasetsの使用preprocess_model_input

​Note: This is a temporary workaround

​セットアップとインポート

​HuggingFaceデータセットの読み込みと準備

​処理と評価関数の定義

​Processing pipeline

​評価の作成と実行

を使用した評価でのHuggingFace Datasetsの使用`preprocess_model_input`

Note: This is a temporary workaround

セットアップとインポート

HuggingFaceデータセットの読み込みと準備

処理と評価関数の定義

Processing pipeline

評価の作成と実行